Generating optimal instruction sequences for bitwise logical expressions

ABSTRACT

A sequence generator generates a table of optimal instruction sequences for all bitwise expression having a specific number of variables. An index generator generates a bit-string index that corresponds to a particular bitwise expression. The bit-string is generated from a truth table. A table lookup unit is coupled with the index generator. The table lookup unit finds an optimal instruction sequence for the bitwise expression from within the table of optimal instruction sequences based at least in part on the generated bit-string index.

CLAIM FOR PRIORITY

This patent application is a National Phase application under 35 U.S.C.§371 of International Application No. PCT/RU2006/000346, filed on Jun.30, 2006, entitled “Generating Optimal Instruction Sequences for BitwiseLogical Expressions.”

FIELD

Embodiments of the invention relate to code optimization, and moreparticularly to generating optimal instruction sequences for bitwiselogical expressions.

BACKGROUND

Most computer architectures and their corresponding instruction setsperform binary operations called bitwise operations. Bitwise operationsare statements that contain two operands joined by a bitwise operator.The operation result is a numeric value, usually in binary form. Forexample, the 64-bit Intel Architecture (IA-64) has four operations: AND(A&B), OR (A|B), XOR (A^B), and ANDCM (A & (˜B)). The 32-bit IntelArchitecture (IA-32) has only three operations: AND, OR, and XOR(although the Streaming Single Instruction Multiple Data (SIMD)Extensions 3 (SSE3) instruction set includes all four operations).Bitwise expressions are statements that represent bitwise operations. Ingeneral, optimization of an arbitrary bitwise expression isNon-deterministic Polynomial-time (NP) hard.

The name “compiler” is primarily used for programs that translate aprogram written in a high level language, typically referred to assource code, into an executable program represented in a lower levellanguage (e.g., assembly language or machine language), typicallyreferred to as object code. Compiler optimization is the process oftuning the output of a compiler to minimize some attribute (or maximizethe efficiency) of the object code. The most common requirement incompiler optimizations is to minimize the time taken to execute theobject code. One way to minimize the time taken to execute a program isto minimize the number of instructions needed to compute the value of anexpression. In traditional compiler science, circuit design associatedwith compilers and optimizations does not distinguish betweenexpressions having large numbers of variables (e.g., more than 5) andexpressions having a small number of variables. Computer programmersfrequently employ a single hard-coded pattern for optimizinginstructions associated with expressions having both large and smallnumbers of variables. This universal approach allows a compiler toaffect a wide range of expressions, but does not necessarily minimizethe complexity and/or the latency of the code associated with aparticular expression, including an expression having a small number ofvariables.

BRIEF DESCRIPTION OF THE DRAWINGS

The following description includes discussion of various figures havingillustrations given by way of example of implementations of embodimentsof the invention. The drawings should be understood by way of example,and not by way of limitation.

FIG. 1 is a table illustrating the relationship between an optimalinstruction sequence and a directed acyclic graph (DAG) according to anembodiment of the invention.

FIG. 2 is a block diagram illustrating an embodiment of a codegenerator.

FIG. 3 is a flow diagram illustrating an embodiment of the inventionthat optimizes code for a bitwise expression.

DETAILED DESCRIPTION

Embodiments of the invention described herein relate to linear-timeoptimal code generation for bitwise expressions of a small number ofvariables. Any bitwise expression can be considered a Booleanfunction/expression. In mathematics, a finite Boolean function is afunction of the form f: B^(k)→B, where B={0, 1} is a Boolean domain andwhere k is a nonnegative integer. In the case where k=0, the “function”is simply a constant element of B. More generally, a function of theform f: X→B, where X is an arbitrary set, is a Boolean-valued function.If X=M={1, 2, 3, . . . }, then f is a binary sequence, that is, aninfinite sequence of 0's and 1's. If X=[k]={1, 2, 3, . . . , k}, then fis binary sequence of length k.

A Boolean function on n variables can be represented as a bit-string of2^(n) bits, corresponding to the truth table (a mathematical table usedin logic to determine whether an expression is true or valid) for thefunction. There are 2^((2^n)) different Boolean functions on nvariables. For example, for n=4, there are 65,536 Boolean functions; forn=5, there are 4,294,967,296 Boolean functions. The bit-stringrepresenting a Boolean function can be considered as the function'sindex, having a range from 0 to (2^((2^n)))−1. A linear algorithm isused to compute the index of the given Boolean function/expression. Thecomputed index is then used to lookup a table entry that identifies anoptimal instruction sequence (discussed in detail below). Thus, an entrywith index N represents an optimal instruction sequence for Booleanfunction N.

As mentioned above, the index for a given function is used to lookup atable entry. Entries in the table contain optimal instruction sequences.In an embodiment associated with bitwise expressions having four (4)variables, the table will consist of 65,536 different entries. Asequence generator may be used to generate the table of optimalinstruction sequences (discussed in further detail below). Or, the tableof optimal sequences may be imported from an external source.

Optimality of an instruction sequence for a given bitwise expression isdefined by the following requirements: 1) the number of instructions forthe bitwise expression must be minimal (minimal complexity), and 2) theheight of a directed acyclic graph (DAG) formed by the instructionsequence must be minimal (minimal height). A DAG is a directed graphwith no directed cycles. This means that for any vertex v, there is nononempty directed path starting and ending on v.

FIG. 1 illustrates three (3) examples of optimal instruction sequencesfor a set of input expressions. The DAG in each example visuallyillustrates the requirements (minimal complexity and minimal height) foroptimality of the respective instruction sequence. In Example 1, thecomplexity of the DAG is two (2) (because there are two instructions).The height of the DAG is also two (2) (because the DAG is one (1) leveldeeper than the level of the root node). In Example 2, the DAG has acomplexity of three (3) and a height of two (2). The DAG of Example 3has a complexity of three (3) and a height of three (3).

As mentioned above, a sequence generator generates a table of 2^((2^n))optimal instruction sequences. A standard algorithm is used to generatethe table of optimal instruction sequences. In one embodiment, thesequence generator includes an algorithm that iterates through allpossible permutations of n variables and all possible permutations ofinstructions having n variables based on the IA-64 instruction set,developed by Intel Corporation of Santa Clara, Calif. In otherembodiments, the algorithm may be based on other instruction sets, suchas the IA-32 instruction set (also developed by Intel Corporation ofSanta Clara, Calif.) The algorithm then iterates through all associatedDAGs to determine an optimal instruction sequence for each entry in thetable. The table of optimal instruction sequences need only be generatedonce, after which the table is stored, for example, in a memory or cache(to be accessed during a table lookup.)

FIG. 2 illustrates an embodiment of a code generator 200. Sequencegenerator 210 generates 2^((2^n)) instruction sequences for all bitwiseexpressions having n variables. The optimal instruction sequences arestored in table 240, which can be implemented using any type of memory,including SRAM, DRAM, Flash, PROM, EPROM, etc.

An input expression is received at an index generator 220. Indexgenerator 220 determines a Boolean function associated with the bitwiseexpression. Each variable in the Boolean function is considered an“essential variable” and is assigned/associated with a single bit in abit-string of length n, with a maximum of one variable per bit.Lower-order bits are assigned before higher-order bits in thebit-string. More specifically, a first variable is assigned to thelowest-order bit in the bit-string; the next variable is assigned to thenext lowest-order bit of the bit string. This process continues untilthe last variable is assigned to the lowest-order bit still available.

The essential variables in the bit-string can be arranged in any order.For example, where n=4 and there are four variables (A, B, C, and D,respectively) in the input expression, the bit-string may be ordered asABCD, BCDA, or any other order. However, if n=4 and there are only threeessential variables (E, F, and G, respectively) in the input expression,the three essential variables are assigned to the three lowest-orderbits of the bit-string. A non-essential variable, or place-holder,(e.g., X) is then assigned to the highest-order bit of the bit-string.Thus, the essential variables can be in any order as long anynon-essential variables (e.g., X) occupy the highest-order bits (e.g.,XGFE, XEGF, XFEG, etc.)

Once the variables for the Boolean function have been arranged as abit-string, the index generator 220 generates a truth table for theBoolean function. A truth table is generated by determining whether theBoolean function is TRUE (i.e., a logical 1) or FALSE (i.e., a logical0) given a set of binary input values for the variables in thebit-string. For example, the Boolean function of Example 3 in FIG. 1 hasthree (3) essential variables. For n=4, there are sixteen (16) differentpermutations of the four (4) variables in the bit-string. Thus, thereare sixteen (16) total results for the sixteen (16) differentpermutations of the bit-string. The results, in binary form, form theindex of the Boolean function. For example, the index of the Booleanfunction of Example 3 in FIG. 1 is 0101010001010100. Having calculatedthe index for the Boolean function, the index generator sends the indexto table lookup unit 230.

Table lookup unit 230 uses the received index to lookup an entry intable 240. In one embodiment, table lookup unit 230 performs a directtable lookup to find an entry in the table. In other embodiments, tablelookup unit 230 may perform a hash table lookup or other form of tablelookup. The instruction sequence found by performing the table lookup isthe optimal instruction sequence for the input expression.

FIG. 3 illustrates an embodiment of the invention that optimizes codefor a bitwise expression. A table of optimal instruction sequences isprovided 310. In one embodiment, the table of optimal sequences isprovided by a sequence generator that generates the optimal sequencesand places them in the table. In another embodiment, the table ofoptimal sequences is provided by downloading or importing pre-computedtable entries from an external source.

A code generator receives an input bitwise expression 320. The generatorthen determines, 330, whether the number of variables in the receivedexpression is less than or equal to a pre-defined number, n. If not, theexpression is discarded or ignored by the generator (but asub-expression of the larger expression may still be optimized). If thenumber of variables in the expression is less than or equal to n, theneach variable is assigned to a bit in a bit-string of length n.Lower-order bits are reserved for essential variables. A variable isessential if the value of an expression depends on the value of thevariable.

Once variables have been assigned in the bit-string, a truth table isgenerated for the bit-string using a Boolean function that ismathematically equivalent to the input expression 340. The resultingtruth table forms an index of length 2^(n). Using the index, the codegenerator performs a table lookup to select an optimal instructionsequence for the bitwise expression 350. The table lookup can be adirect table lookup, a hash table lookup, or any other table lookup.

Embodiments of the invention described above may include hardware,software, and/or a combination of these. In a case where an embodimentincludes software, the software data, instructions, and/or configurationmay be provided via an article of manufacture by a machine/electronicdevice/hardware. An article of manufacture may include a machineaccessible/readable medium having content to provide instructions, data,etc. The content may result in an electronic device, for example, adisk, or a disk controller as described herein, performing variousoperations or executions described. A machine accessible medium includesany mechanism that provides (i.e., stores and/or transmits)information/content in a form accessible by a machine (e.g., computingdevice, electronic device, electronic system/subsystem, etc.). Forexample, a machine accessible medium includes recordable/non-recordablemedia (e.g., read only memory (ROM), random access memory (RAM),magnetic disk storage media, optical storage media, flash memorydevices, etc. The machine accessible medium may further include anelectronic device having code loaded on a storage that may be executedwhen the electronic device is in operation. Thus, delivering anelectronic device with such code may be understood as providing thearticle of manufacture with such content described above.

As used herein, references to one or more “embodiments” are to beunderstood as describing a particular feature, structure, orcharacteristic included in at least one implementation of the invention.Thus, phrases such as “in one embodiment” or “in an alternateembodiment” appearing herein describe various embodiments andimplementations of the invention, and do not necessarily all refer tothe same embodiment. However, they are also not necessarily mutuallyexclusive. The above descriptions of certain details andimplementations, including the description of the figures, may depictsome or all of the embodiments described above, as well as discussingother potential embodiments or implementations of the inventive conceptspresented herein.

Besides what is described herein, various modifications may be made tothe disclosed embodiments and implementations of the invention withoutdeparting from their scope. Therefore, the illustrations and examplesherein should be construed in an illustrative, and not a restrictivesense. The scope of the invention should be measured solely by referenceto the claims that follow.

What is claimed is:
 1. A hardware memory including a code generator for bitwise expressions, comprising: a sequence generator to generate optimal instruction sequences for the bitwise expressions and to populate a table with the optimal instruction sequences; an index generator to generate a bit-string index for a bitwise expression based at least in part on a truth table corresponding to the bitwise expression; and a table lookup unit coupled with the index generator to find an optimal instruction sequence for the bitwise expression from the table of optimal instruction sequences based at least in part on the bit-string index, wherein the optimal instruction sequence comprises a minimal number of instructions for computing the bitwise expression, and wherein a directed acyclic graph (DAG) corresponding to the optimal instruction sequence has a minimal height compared to other DAGs having the minimal number of instructions.
 2. The hardware memory of claim 1, wherein the table lookup unit is operable to be accessed via direct table lookup.
 3. The hardware memory of claim 1, wherein the table lookup unit is operable to be accessed via hash table lookup.
 4. A method performed by a processor for generating code for bitwise computation of bitwise expressions, comprising: receiving a bitwise expression having a plurality of variables that is less than or equal to a pre-defined number, and for each variable in the bitwise expression, assigning the variable to a lowest-order available bit in a bit-string having a bit-length equal to the pre-defined number; generating a truth table for the bit-string based at least in part on a Boolean function corresponding to the bitwise expression; and performing a table lookup in a table of optimal instruction sequences based at least in part on the truth table to select an optimal instruction sequence for the bitwise expression, wherein the instruction sequence is optimal if: the instruction sequence has a minimal number of instructions for the bitwise expression, and a directed acyclic graph (DAG) corresponding to the instruction sequence has a minimal height as compared to other DAGs having the minimal number of instructions.
 5. The method of claim 4, wherein the pre-defined number of variables is four (4).
 6. The method of claim 4, wherein performing the table lookup comprises performing a direct table lookup.
 7. The method of claim 4, wherein performing the table lookup comprises performing a hash table lookup.
 8. The method of claim 4, wherein the Boolean function comprises one or more bitwise binary operations selected from the group of bitwise binary operations consisting of AND, OR, exclusive OR (XOR), AND with complement (ANDCM), and a combination thereof.
 9. An article of manufacture comprising a non-transitory machine-accessible medium having content to provide instructions to result in an electronic device performing operations that enable a compiler to minimize instructions for computing a bitwise expression, including: determining a Boolean function for the bitwise expression; generating an index from a truth table corresponding to the Boolean function; and performing a table lookup of pre-determined optimal instruction sequences based at least in part on the index to find an optimal instruction sequence for the bitwise expression, wherein the instruction sequence is optimal if: the instruction sequence has a minimal number of instructions for the bitwise expression, and a directed acyclic graph (DAG) corresponding to the instruction sequence has a minimal height as compared to other DAGs having the minimal number of instructions.
 10. The article of manufacture of claim 9, wherein the non-transitory machine accessible-medium includes further content to define the Boolean function as comprising bitwise binary operations selected from the group consisting of AND, OR, exclusive OR (XOR), and a combination thereof.
 11. The article of manufacture of claim 9, wherein the non-transitory machine accessible-medium includes further content to define the Boolean function as comprising bitwise binary operations selected from the group consisting of AND, OR, exclusive OR (XOR), AND with complement (ANDCM), and a combination thereof.
 12. The article of manufacture of claim 9, wherein performing the table lookup comprises performing a direct table lookup.
 13. The article of manufacture of claim 9, wherein performing the table lookup comprises performing a hash table lookup.
 14. A system, comprising: a sequence generator to generate optimal instruction sequences for all bitwise expressions having the number of variables and to populate a table with the optimal instruction sequences; a dynamic random access memory (DRAM) coupled to the sequence generator to store the table of optimal instruction sequences; an index generator to generate a bit-string index for a bitwise expression based at least in part on a truth table corresponding to the bitwise expression; and a table lookup unit coupled with the index generator and the DRAM to find an optimal instruction sequence for the bitwise expression from the table of optimal instruction sequences based at least in part on the bit-string index, wherein an instruction sequence is optimal if: the instruction sequence requires a minimal number of instructions for computing the bitwise expression, and a directed acyclic graph (DAG) corresponding to the instruction sequence has a minimal height compared to other DAGs having the minimal number of instructions.
 15. The system of claim 14, wherein the table lookup unit is operable to be accessed via direct table lookup.
 16. The system of claim 14, wherein the table lookup unit is operable to be accessed via hash table lookup. 